fix(mcp): add timeout to background thread join in stop() to prevent hangs#1792
Open
giulio-leone wants to merge 3 commits intostrands-agents:mainfrom
Open
fix(mcp): add timeout to background thread join in stop() to prevent hangs#1792giulio-leone wants to merge 3 commits intostrands-agents:mainfrom
giulio-leone wants to merge 3 commits intostrands-agents:mainfrom
Conversation
…hangs When an Agent holding an MCPClient goes out of scope inside a function, the Agent.__del__ finalizer calls MCPClient.stop() which calls _background_thread.join(). If the background thread cannot exit promptly (e.g. transport subprocess teardown is slow), join() blocks indefinitely, causing the entire process to hang on exit. Add a timeout (equal to startup_timeout, default 30s) to the join() call. If the thread does not exit within the timeout, log a warning and proceed with cleanup. The thread is already a daemon thread (daemon=True), so it will be cleaned up by the interpreter on process exit. Fixes strands-agents#1732 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Prevents process hangs on interpreter shutdown by ensuring MCPClient.stop() doesn’t block indefinitely when joining the background thread (notably when called from Agent.__del__).
Changes:
- Add a timeout (based on
startup_timeout) to_background_thread.join()inMCPClient.stop() - Log a warning when the join times out and continue cleanup
- Update/add tests to assert
join(timeout=...)is used and exercise the timeout path
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/strands/tools/mcp/mcp_client.py |
Adds join(timeout=...) + warning when the background thread doesn’t exit promptly. |
tests/strands/tools/mcp/test_mcp_client.py |
Updates existing stop() tests for the new join timeout behavior and adds a new timeout-path test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…imeout When join() times out and the thread is still alive, return early without closing the event loop or resetting internal state. This prevents RuntimeError from closing a running loop and avoids allowing a second start() while teardown is still in progress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Author
|
All CI checks pass. Ready for review. |
Expand the docstring to explicitly describe the expected behavior: skip event loop close and state reset when thread is still alive.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When an
Agentholding anMCPClientis created inside a function and goes out of scope, theAgent.__del__finalizer callsMCPClient.stop()which invokes_background_thread.join(). If the background thread cannot exit promptly (e.g. transport subprocess teardown is slow),join()blocks indefinitely, causing the entire process to hang on exit.Changes
startup_timeout, default 30s) to thejoin()call instop()daemon=True), so it will be cleaned up by the interpreter on process exitTesting
join()is called with timeouttest_stop_does_not_hang_when_join_times_outcovering the timeout pathFixes #1732